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ABSTRACT 

We study analytically the possibility that mergers of haloes are more highly clustered than 
the general population of haloes of comparable masses. We begin by investigating predictions 
for merger bias within the extended Press-Schechter formalism and discuss the limitations 
and ambiguities of this approach. We then postulate that mergers occur whenever two ob- 
jects form within a (small) fixed distance of each other. We therefore study the clustering of 
pairs of points for a highly biased population in the linear regime, for the overall mass dis- 
tribution in the quasilinear regime, and (using the halo model of clustering) in the nonlinear 
regime. Biasing, quasilinear evolution, and nonlinear clustering all lead to nonzero reduced 
(or connected) three-point and four-point correlation functions. These higher-order correla- 
tion functions can in many cases enhance the clustering of close pairs of points relative to 
the clustering of individual points. If close pairs are likely to merge, then the clustering of 
mergers may be enhanced. We discuss implications for the observed clustering of luminous 
z = 3 galaxies and for correlations of active galactic nuclei and galaxy clusters. 
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1 INTRODUCTION 

Galaxy clustering can be a useful tool to study the origin of large- 
scale structure and to delineate the formation mechanisms of vari- 
ous types of galaxies. For example, it is now well appreciated that 
objects forming from rare high-density peaks in the primordial den- 
sity distribution, such as bright galaxies at high redshifts or galaxy 
clusters today, should be "biased" (i.e., more highly clustered) rel- 
ative to the more common lower-mass objects that more closely 
trace the total-mass distribution <KaiseJl984l) . 

A currently unanswered question is whether the growth his- 
tory of haloes can affect their clustering properties. Cosmological 
simulations give confusing r esults. lKolattetalJ Jl999l) argued that 
merger-driven starbursts at z ~ 3 occur in small haloes that lie 
near larger ones: thus they are more highly clustered than typi- 
cal objects of th e same mass (see als o rWechsler et alj|200ll) . The 
simulations of Gottlober et al. 1 2002) showed different clustering 
at z = between obj ects that h ad experienced a major merger 
and those that had not. Kauffmann & Haehnelt (2002) also found 
a weak enhancement in the cross-correlation between objects un- 
dergoing major mergers and th e general population , but only at 
small scales. On the other hand. lPercival et all 120 03) found no ev- 
idence for excess merger bias at z = 0, where recently-merged 
objects were identified as haloes in which at least 50 per cent of 
constituent particles wer e not in a progenito r of at least e qual mass 
at a fixed earlier redshift. Scanna pieco & Thacked 1200 3l) agreed at 
z = 3, but if they modified the criterion to include all haloes that 
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grew by 20 per cent or more (implicitly including smooth infall), 
the rapidly-growing sample had a substantial excess bias, making 
their clustering comparable to th at of haloes w i th thre e times more 
mass. Most recently, iGao. Springel. & White! J2005) examined a 
high-dynamic-range N-body simulation at z = 0. They found the 
clustering of low-mass recently-merged objects to be suppressed 
relative to the average. For example, in their lowest-mass bin (with 
a mass ps 2% of the characteristic halo mass), the 20% youngest 
and oldest haloes are under- and over-biase d by ~ 40%, respec- 
tively. On the other hand, in agreement with lPercival et al] 120031) . 
they found that the clustering of more massive objects is nearly in- 
dependent of their age. The verdict is clearly not yet in: how can 
we reconcile these disparate results? 

The question is not just academic. Clustering is often used 
to infer informatio n about the host halo mass of particular galaxy 
populations (e.g.. Imo & Fukugital Il996t lAdelberger et alJ Il99a 
lijiavalisco et alJI 1998l> . The possibility that clustering depends on 
the merger history — which obviously also strongly affects observ- 
ables such as the star-formation history — would call such infer- 
ences into question. One example is the discrepancy between the 
masses (~ 10 12 Mr) of Lyman-break galaxies (LBGs) in ferred 



from their clustering j Colese^j] 199^ jGJavalisco& Dickinsonl 
1200 it IPorciani & Giavaliscd l2002t lAdelberger et alJ 120051) and 
the dynamical masses (~ 10 11 Mg) inferred from the broaden- 
ing of nebular emission lines and kinematics IPettini et alj|200lt 
lErb et all20ol . This claimed discrepancy may simply be the dif- 
ferenc e between the mass in the c entral regi ons and the tota l 
mas s lErb et alJl2003t ICoorayll2005l) . but IWechsler et all <200ll) 
and Scannapieco & Thacked (2003) have proposed that it may 
also point to "merger bias" if LBGs are galaxies that have re- 
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cently undergone mergers. The problem is even more extreme for 
submillimetre-selected galaxies at z > 2: their dynamics imply to- 
tal masses M < 1 12 M while clustering implies M > 10 13 M Q 
felain etall20 04V 

More generally, to what extent does clustering depend on 
factors other than the halo mass? Will selection techniques that 
trace recent episodes of star formation (such as Lyman-break or 
Lya line selection) yield more highly clustered objects than tech- 
niques sensitive to the total stellar mass (such as infrared observa- 
tions), even if the typical halo masses in the surveys are identical? 
Quasars and other active galactic nuclei (AGN) may also be trig- 
gered by galaxy mergers. Their clust ering has been used to infer 
the properties of the host galaxy jlla Franca. Andreani. & Cristianil 
Il998t lAdelbereer & Steidell l2005bl) and of the quasar (espe- 
cially its lifetime:lHaiman & Hul200lllMartini & Weinber J200ll: 
Adelb ereer & Steidelll2005al) . How will the bias of mergers (if it 
exists) affect such estimates? Will recently-merged galaxy clusters 
trace the underlying mass distribution differently than relaxed clus- 
ters? All of these questions have implications for our understand- 
ing of both galaxy formation and the large-scale structure of the 
universe. 

In this paper, we take an analytic approach that complements 
the numerical studies and may aid in their interpretation. We begin 
in Section[2|by considering the question of "merger bias" within the 
context of the widely-used linear-bias model. We show that exist- 
ing techniques cannot adequately answer this question, so we then 
go on to consider other approaches. To be more precise, in Sections 
3-7, we derive analytic results for the clustering of close pairs of 
galaxies in several clustering models. We consider the clustering of 
close pairs when galaxies Poisson sample (a) the overall mass in 
a Gaussian random field; (b) the high-density peaks in a primor- 
dial Gaussian random field; (c) the overall mass in the quasilinear 
regime; and (d) the overall mass in the nonlinear regime described 
by the halo-clustering model. We find that the clustering of close 
pairs of galaxies can be enhanced, sometimes significantly, relative 
to the galaxies in many of these cases. We speculate that if close 
pairs are likely to merge, then a pair bias will imply a merger bias, 
although we do not make this statement precise. If a pair bias does 
in fact lead to a merger bias, then our results are consistent with a 
solution to the LBG puzzle. We also briefly discuss other observ- 
able implications of our results. 



2 A FIRST LOOK AT MERGER BIAS 

We will first attempt to compute the bias of merging objects via 
their nu mber densities and the "peak-backgroun d split" approach 
to bia s lEfstafhiou etaT]|l988t ; Icole & Kaisejll989t iMo & White! 
1996). We define the number density n m dmi dm,2 of mergers be- 
tween haloes in the mass range mi — > mi + dmi and those in the 
mass range m — ► m,2 + dm2 via 



n m (mi, Hi2, z) = n(mi, z) n(ni2, z) Q(m\, ni2, z) At, 



(1) 



where n(m, z) dm is comoving number density, at redshift z, of 
haloes with masses m — > m + dm and Q (mi , m2 , z) is the merger 
kernel with units of volume per unit time. We take At to be some 
finite time interval within which the mergers of interest take place; 
note that we assume it to be sufficiently small that the underlying 
halo populations do not evolve significantly. 

To compute the bias, we simply need to know how each of 
these terms varies (to linear order) with the mean density S in some 



large patch. For example, the lPress & Schechted J 1974) mass func- 
tion is 



/2 p S c (z) 
n(m,z) = \ - 



d In a 



d In m 



exp 



2a 2 



(2) 



where S c is the fractional-overdensity threshold for spherical col- 
lapse, p is the mean background density, and a 2 is the fractional- 
density variance smoothed on scale m. Note that we follow the 
convention in which a is independent of redshift, while S c (z) is 
the (linear-extrapolated) density threshold at redshift z. This dis- 
tribution can be derived in terms of a diffusion probl em in (cr 2 , S) 
space with an absorbing barrier at 8 = 5c bond et all 199ll) . Such 
an approach makes it obvious that the abundance of haloes in a 
region of (linear-extrapolated) overdensity 5 and mass M (corre- 
spondi ng to om) will take the same form, but with a shift in the 
origin l Lacev & Cole! 19 93): 



n(m, z\5, M) 



x exp 



2 _p_ a 2 [S c (z) - 5] 
(a 2 - a|,)3/2 

[5 c (z)-5] 2 



d In a 



d In m 



(3) 



2(<r 2 - a 2 M 

To find the linear bias. Mo & W hite 1 1996) first take the large- 
scale limit M — > 00 (or au — * 0). The overdensity of haloes in a 
region of physical volume V is 



6h 



n(m,z\S)V(l + S z ) 
n(m, z)V 



- 1, 



(4) 



where S z is the true overdensity at redshift z (without linear extrap- 
olation) and the (1 + 8 Z ) factor in the numerator accounts for the 
fact that an overdense region is larger in Lagrangian space than in 
physical space. Expanding equation J5J to linear order, we find 



6 h 



5 c (z = 0) 
b h (m,z)8 z + 0(S 



+ 0(6 2 



(5) 



2 ) 

z )l 



where we have let v — S c (z)/a. This defines the usual bias 
bh(m, z) for haloes of mass m at redshift z. 



2.1 The extended Press-Schechter merger kernel 

To compute the merger bias, we need to perform a similar expan- 
sion on the kernel Q. The usual model for this quantity comes 
from t he extended Press-Schechter merger rates of lLacev & Cold 
( 1993). Unfortunately, as we will see explicitly below, this formal- 
ism is inherently unable to address our problem: the large-scale 
bias of mergers disap pears from the calculation. Letting S = a 2 , 
lLacev & Colel 1 19931) define /(Si , S cl \S T , 5 cT ) to be the fraction 
of excursion-set trajectories that first cross S c i > 5 c t at Si > St, 
given that they first cross 5 c t at St (here the subscript T refers 
to the total mass). This is exactly equivalent to n(m,z\8, M) in 
equation J3J with the identifications (5*i «-> m), (St <-> M), 
Sci = Sc(z), and S c t = S; the only difference is that here we 
assume M to be in a collapsed halo at a later redshift. To obtain 
the merger rate, we will need /(St, S c t\Si, S c i) instead: given a 
halo at some early time, this function describes the distribution of 
objects to which that halo can belong at some later time. By Bayes' 
theorem, it is simply 

f(ST,ScT\Si,S c i)dST ~ f(Si,5ci\ST,S C T) f f ^ c J ,5 x cT } dS T , (6) 
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where f(S, <5 C ) is the unconditional first-crossing distribution (i.e., 
the normal Press-Schechter halo mass function). The extended 
Press-Schechter formalism defines d 2 p/dmdt, the probability that 
a halo of mass mi will merge with an object of mass m,2 = 
rriT — mi within an infinitesimal time interval dt, from the limit of 
this distribution as S c t — * 5d. In other words, it is the probability 
that the object will join a larger halo in the time interval of interest. 
The total merger rate n m (mi , m) is then this limit (transformed to 
mass and time units) multiplied by n(mi). 

For our problem, we need to know the dependence of each 
of these quantities on the large-scale density Sb (defined over some 
mass with Sb <C Si , St)- The unconditional distributions are easy: 
f(S,8 c ) — > f(S, S c \Sb, Sb), just like the conditional mass function. 
We are thus left with the progenitor distribution /(Si, 6 c i\St, 5 c t) 
within the large-scale region. Recall, however, that this distribution 
follows from a diffusion problem with origin (St,5 c t)- It must 
therefore be independent of the behavior on scales Sb < St', we 
only need to know that it passes S c t for the first time at St to com- 
pute the progenitor distribution. This step is obviously where the 
extended Press-Schechter formalism fails: it is completely unable 
to incorporate the large scale environment of merger events, so it 
cannot make predictions about their bias. To see this explicitly, we 
calculate how merger densities vary with Sb'. 



10 



n m (mi, m\Sb) oc n(mi\Sb) 



d 2 p(S b 



dm dt 



At 



f(Si,6 c i\S b ) > 
f(S T ,5 cT \5b). 



/(St, S C T\Sb) 
f(Si,S c i\S b ) 



(7) 

(8) 
(9) 



Thus, according to the extended Press-Schechter model, n m varies 
with density in precisely the same way as the number density of 
haloes with the same final mass txit- Clearly there is no merger 
bias in this picture, but only because the formalism is unable to 
address the relevant question. 

Thus the conclusion of this model is not one that we can 
trust. In addition to t his dif ficulty, there is the deeper one pointed 
out bv lBenson et alJ 120051) . who showed that the extended Press- 
Schechter merger rates are mathematically self-inconsistent (call- 
ing into question the association of trajectory jumps with mergers). 
While it has proven useful in a variety of contexts for galaxy for- 
mation, the extended Press-Schechter formalism is manifestly not 
appropriate for investigating merger bias. 



2.2 A density-independent merger kernel 

Unfortunately, at this time, there are no fully developed alternatives 
to the extended Press-Schechter formalism (but see lBenson et alJ 
2005 for first steps in this direction). We therefore obviously cannot 
compute the variation of Q with the large-scale density. Instead we 
will consider the simplest possible model. We will assume that the 
merger kernel Q is independent of environment in the Lagrangian 
space to which the Press-Schechter formalism is native: that is, 
the merger rate varies with the local density only through the La- 
grangian number density of haloes. This would be appropriate if, 
for example, all Gaussian peaks within a fixed comoving distance 
merged with each other, and if we neglect extra correlations be- 
tween neighboring haloes. In other words, we treat each of the two 
haloes independently of the other; clearly, this is not completely 
correct, because the large-scale bias does not describe the small- 
scale correlations between haloes (e.g., IScannapieco & Barkanal 
2002). We emphasize, then, that our model is not meant to be 




Figure 1. Merger bias at z = 3. The dot-dashed line shows the normal 
halo bias b/ x for the final merger product. The thin solid, long-dashed, and 
short-dashed curves take m2/mi = 1, 0.5, and 0.1, respectively. 



quantitatively accurate but only to illuminate the dependence of the 
merger bias on the halo abundances. In this case, we define the 
overdensity of mergers via 



N m (mi,m2,z\5) 
n m (mi, m 2 , z)V 



(10) 



where N m is the number of mergers in this volume. Clearly N m oc 
n(mi \S) n(m,2\8) V (1 + S z ). Expanding to linear order, we find a 
merger bias 



bm = 1 + 



v\ + uj-2 
S c (z = 0) ' 



(11) 



where vi = v(vrii), etc. 

For a given final mass v, we can then compute the bias of 
mergers as a function of the mass ratio. We show some results at 
z — 3 in FigureQas a function of i/n n = v(mi + m^). Inter- 
estingly, in this model, b m > bh for v ^» 1: mergers between 
massive objects tend to occur in denser regions than an average 
halo of the final mass (or in other words, younger systems are more 
biased than older systems). The behavior reverses at small masses: 
younger systems are less biased than average. Qualitatively, a dark- 
matter particle in a halo with v < 1 must be in a low-density en- 
vironment; small-mass objects that have just formed will typically 
be in lower-density environments than an average halo of this type. 

Figure|2| shows the ratio between the merger and halo bias at 
both z — 3 and 2 = 0. Note that it appears to asymptote to a 
constant at large v. This is simply b m /bh — > (y\ + ^D/^ln! me 
excess bias will thus disappear when one progenitor contains nearly 
all of the final mass. Also, b m can become negative for sufficiently 
small mass mergers: such events preferentially occur in underdense 
environments. Note also that in this model the merger bias at fixed 
un n depends on redshift, even though the halo bias does not; this 
is because (for a fixed mass ratio) the ratio V1/1/2 does depend on 
redshift through the scale dependence of the effective slope of the 
cold-dark-matter (CDM) power spectrum. 
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Figure 2. Ratio between the merger bias b m and the halo bias (of 
the final product). The solid, long-dashed, and short-dashed curves take 
mj/mi = 1, 0.5, and 0.1, respectively. The upper thick and lower thin 
sets of curves take 2 = 3 and 2 = 0, respectively. 



Of course, it is not obvious that taking Q to be constant in 
Lagrangian space is the most reasonable assumption. We could in- 
stead have taken it to be independent of environment in physical 
(Eulerian) coordinates. Then the appropriate bias would be 

b' m = b h (m 1 ) + b h (m 2 ) = b m + 1, (12) 

because in this case Q cx (1 + S z ) when expressed in the La- 
grangian space. This would be appropriate if, for example, mergers 
occurred only through random collisions in physical space. In this 
approximation, mergers are even more biased for large v and less 
antibiased for small v. It is not clear which of these assumptions is 
more physically plausible, but interestingly they both predict posi- 
tive bias (bm/bh > 1) for mergers of massive haloes and antibias 
(bm/bh < 1) for mergers of sufficiently small haloes. 

Comparison to the simulation results illuminates some of the 
proper ties of Q appropriate to halo growth. iGao. Springel. & White! 
(2005) found that, for small haloes at z = 0, younger objects are 
less biased than average. This fits, at least qualitatively, with our 
Q = constant results, which predict b m < bh for v < 1, However, 
they also found no evidence for age-dependent clustering in mas- 
sive objects (see also lPercival et all2003l) . This is in conflict with 
the Q — constant results, which predict a 10-20% enhancement to 
the merger bias for large v. Taken at face value, this implies that 
the merger rate of massive objects must be suppressed in dense re- 
gions. On the other hand, Scannapieco & Thacker (2003) claimed 
a positive merger bias for massive haloes in simulations at z = 3. 
The Q = constant model provides an important clue that may ex- 
plain this apparent redshift evolution: it does indeed predict a larger 
merger bias at early times. The reason is that the merger bias de- 
pends on v(m\) + v(m2) and not simply v{vn\ + 7712). The char- 
acteristic scale of the mass function grows with time; because the 
CDM power spectrum is not a simple power law, the relation be- 
tween these three quantities changes with time. Thus, although the 
halo bias at a fixed v is independent of redshift, the bias of major 
mergers need not be. 



3 CLUSTERING OF PAIRS 

The last Section showed that, until we have a self-consistent merger 
kernel Q that correctly incorporates the density dependence of the 
merger rates, we cannot properly predict the linear merger bias 
within the Press-Schechter model. It is therefore worth considering 
other approaches to merger bias to see what light they can shed. In 
this and the following Sections, we will examine a picture in which 
mergers simply correspond to closely spaced objects. Intuitively, 
such pairs may merge because of (for example) nonlinear gravita- 
tional collapse that brings objects closer together. We will consider 
how close pairs are biased relative to the objects themselves and 
show that, in general, the pair bias differs from the halo bias. 

Consider a population of galaxies with mean spatial density n. 
Then the differential probability to find a galaxy in an infinitesimal 
volume element dV is dP = ndV. The differential probability 
to find one galaxy in dVi centered on a position ri and another in 
dV2 centered on r*2 is dP = n 2 dVi dV2 [1 + £(|rt — ^l)], where 
£(r) is the galaxy-galaxy autocorrelation function. The correlation 
function is the excess probability, over random, to find two galaxies 
in differential volume elements separated by a distance r. 

There can never be more than one galaxy in an infinitesimal 
volume element dV . However, we will soon deal with close pairs 
of galaxies. We will thus want to know the probability to find two 
galaxies in one small, but finite, volume element 8V . To be precise, 
we take this volume element to be a sphere of radius r p ; then 5V = 

(4tt/3) 



Vp. The desired probability is then 



SP = nf / d J n / d"ra [l + t(\n -fa|)] 
Jsv Jsv 

2/ crr\2 /, , / r 2\ 



n 2 (SV) 2 (l + (S 2 p )) 



(13) 



where 

(sl) = (svy 

= (svy 

= (svy 



d r6(f— x] 




d T2 {S(fi — x)8{?2 — x)) 



r 2 f(|fi - r 2 |), 



(14) 



is the variance of the density perturbation smoothed over a spheri- 
cal top hat of radius r p . If the correlation function can be approxi- 
mated by a power law, £(r) cx r~ a , for r < r p , then 



2 j 



x 2 dX2 



du 

1 (*?• 



x 2 — 2X\X2H) 



,/2' 



(15) 



For a = 0, the integral evaluates to 2/9. And for a = 1, 2, and 3, 
it evaluates to 0.27, 0.50, and 5.0, respectively. The integral is 0.41 
for a = 1.8. 

To begin, we take the radius of the sphere so that the probabil- 
ity to find three or more galaxies is small compared with that to find 
two. Roughly speaking (neglecting corrections from higher-order 
clustering that will become apparent below), this requires the prob- 
ability to find two galaxies in SV to be small compared with that to 
find one. We thus require the radius r p to be chosen small enough 
so < 1, or usually just n 8V( S p ) < 1, since 

we will often have (Sp) > 1. 

If two galaxies fall within the same radius-r p sphere, then we 
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call this a pair. If SP [cf., equation J13H is the probability to find 
two galaxies in a volume 5V, and if SP <C 1, then the spatial 
density n 2 of pairs is SP/SV = n 2 (5V) (l + (S p )) . The pair-pair 
autocorrelation function X(r), the excess probability over random 
to find a pair in each of two volumes SVi and SV 3 separated by a 
distance ri3, is defined by 

SP = n 2 2 8ViSV s [l + X(ri 3 )] 

= n 4 (SV 1 ) 2 (5V 3 ) 2 (l + (S 2 p )) 2 [l + X(r i:i )}, (16) 

where SP is here the joint probability to find one pair in SVi and 
another in SV3. 

A pair of pairs is a quadruplet. To describe the clustering of 
pairs of galaxies, we will therefore need the four-point correlation 
function. The joint differential probability to find objects in differ- 
ential volume elements dVi, dV 2 , dV 3 , and dVi loc ated, respec- 
tively, at positions r\, ri, r 3 , and F4, is lPeeblesll98cl) 

SP = n 4 dVi dV 2 dV 3 dV 4 

x [1 + £12 + 5 permutations 

+ ((ri,f 2 ,f 3 ) + 3 permutations 

+ £12 £34 + 2 permutations 

+ T?(n,r 2 ,r3,r 4 )] . (17) 

Here, C(fi, r2,r 3 ) is the reduced (or "connected") three-point cor- 
relation function, 77 is the reduced (connected) four-point correla- 
tion function, and we have introduced the shorthands rij = \r*i— fj\ 
and also £y = £(|Fj — rj\). The quantity in brackets is the com- 
plete four-point autocorrelation function. For Gaussian perturba- 
tions, f = 77 = 0. 

To find the pair autocorrelation function, we now consider the 
case where two of the galaxies (1 and 2) are in one volume (SVi) 
centered at ri and the other two (3 and 4) are in another (SV 3 ) 
centered at f 3 . We also assume that the separation \r\ — r 3 \ 3> r p . 
The joint probability to find two galaxies in SVi and two in S V 3 is 
thus 



SP 



— n 4 d 3 xi / d 3 a;2 / d 3 xs / d 3 a;4 
JSV 1 J SVi J sv 3 J sv 3 

x [1 + C12 + £34 + 4^i3 + £1264 

+ 2£i 3 + 2C(ri2,ri3,ri3) + 2((r 34 , r 13 , r 13 ) 

+ T?(r-i2,r , i3,ri4,r23,J'24,r34)] ; 



(18) 



note that in this equation (only), £12 = f (\xi — a?a|) and similarly 
for £34. We next note that 



(19) 



d xi / d x 2 d x 3 C(xi,X2,x 3 ) 

ISVi J SVi J 5V 3 

= (5V) 3 (S 2 p (f 1 )5 p (f 3 )) c , 

the (reduced) three-point correlation function (with two of the three 
points coincident) for the smoothed density field, and 



d 3 a;i / d 3 :T2 / d 3 x 3 / d 3 a;4 



SVi J 6V! JSV 3 JSV 3 

xr](x 1 ,x 2 ,x 3 ,x 4 ,) 
= {5V) i (8 2 1 (r 1 )5 2 p {r 3 )) c , (20) 

a (reduced) four-point correlation function. Equating equations < 1 61 
and 1 1 81 . we find 

X{r) = [Ai{r) + 2e{r) + A(8 2 p {x)8 v {x + r)) c 



nSV (l + (<5p/) -C 1, and it is valid for any galaxy-galaxy two- 
point, three-point, and four-point autocorrelation functions. We 
thus find that the calculation of the pair correlation function reduces 
to the calculation of the correlation of the density 5 P with S 2 and 
the autocorrelations of S 2 , a result that should come as no surprise. 

We will define the effective pair bias via b 2 = [X(r)/£(r)]; it 
is the excess bias of pairs relative to individual objects. Note then 
that, in the language of Section 2, the net merger bias is b m = 
b h b p . 



PAIR CLUSTERING FOR GAUSSIAN 
PERTURBATIONS 



For Gaussian perturbations, £ 
relation function simplifies to 



X(r) = 



4g(r)+2[g(r)] 2 
(l + (^» 2 



rj — and the pair-pair autocor- 



(22) 



+ (S 2 p (x)Sl(x + r)) c ]/{l + (S 2 p ) 



2\\2 



(21) 



In the limit of weak correlations, (8p\,£ <C 1, X(r) ~ 4£(r). 
This is easy to understand: given two galaxies in the first cell, each 
contributes a factor £(r) to the excess probability to find a galaxy 
in the second cell (at least to linear order), and for X(r) there are 
two such galaxies in the second cell. Although of interest academ- 
ically, this limit is probably not relevant for galaxies or clusters of 
galaxies, as a value (S p ^ < 1 requires that we deal with objects 
that are so rare that their mean separations are > Mpc. 

If £( r ) 1 an d (^p) Si 1> tnen me clustering of pairs is 
suppressed relative to that of individual galaxies, a consequence of 
the scarcity of pairs relative to individual galaxies. In the limit of 
strong clustering, £(r), (S p ^ 3> 1, the pair correlation function be- 
comes X(r) ~ 2[£(r)] 2 /^<5p^ 2 , which is again suppressed relative 
to the galaxy correlation function. The applicability of this limit, 
however, should be questioned, as f; > 1 generally implies non- 
Gaussian perturbations. Interestingly, this simple exercise implies 
that merger bias can operate in different directions, depending on 
the regime of interest — as indeed the simulations discussed above 
find. 



5 CLUSTERING OF GAUSSIAN PEAKS 

We have just seen that if objects trace the distribution of mass 
in a system with Gaussian perturbations with some specified cor- 
relation function, then the pair correlation function is suppressed 
relative to the normal correlation function, unless the correlations 
are weak, in which case it can be enhanced by up to a factor of 
4. If, however, objects form only at high-density peaks of a pri- 
mordial density distribution, then the distribution of these objects 
will be non-Gaussian. That this is true is easy to see. The one- 
point probability distribution function for Gaussian perturbations 
is P(5) cc e~ s l 2<y , where a 2 is the variance. This distribution 
has zero mean, no skewness, no kurtosis, and no higher-order (re- 
duced) cumulants. The one-point probability distribution of high- 
density peaks is P(8) <x e~ 6 / 2ct for S > va and P(S) = for 
S < va. This distribution has nonzero mean, nonzero skewness, 
kurtosis, etc. 

This non-Gaussianity introduces no n-zero reduced three- 
point and four-point correlation functions i Politzer & Wis all984t 
iBardeen et aTlll986l: Ijensen & Szalavlll986t iMelott & Frvlll986h . 
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even if the total-density-perturbation amplitude is linear, £ < 1. 
Although the exact expressions can be complicated, they simplify 
considerably when v 2> 1. In this limit, the full n-point correlation 
function can be w ritten in terms o f the galaxy two-point correlation 
function f s (r) as JPolitzer & Wisel 1984 



l + eW(n,..,n,)^n^( r *i) + 1 ]- 



(23) 



The galaxy correlation function £ 9 (r) is the correlation function for 
the objects, rather than for the total mass. Thus, we can have £ s > 1 
even in the linear regime, £(r) < 1, if the objects are highly biased 
tracers of the mass distribution. In this case, we can simply replace 
the expression in brackets in equation i 1 8i with [1 + from 
equation 1231 . Then, the pair autocorrelation function becomes 



X(r)~[l + £ g (r) 



(24) 



This equation is the central result of this Section. It says that, if ob- 
jects trace the distribution of peaks in a Gaussian density distribu- 
tion, then the clustering of pairs can be strongly enhanced relative 
to the clustering of individual objects. Equation 1241 is valid for 
highly-biased objects {v 3> 1) on scales at which the underlying 
matter fluctuations are line ar (even if fl uctuations in the population 
of the objects is not small: Ipolitzer & Wisdl 19841 . It thus applies 
to haloes well above the characteristic mass scale (such as sub- 
millimetre galaxies at z ~ 3 or extremely massive clusters at the 
present day). Physically, higher-order clustering — in particular, the 
four-point correlation function from equation 1231 . which provides 
nonzero reduced three- and four-point functions — of high-density 
peaks is enhanced with this type of non-Gaussianity, and this favors 
the clustering of pairs over individual objects. Thus, if mergers can 
be equated with close pairs of galaxies, we do expect a significant 
merger bias in the limit v 3> 1. 



6 QUASILINEAR PERTURBATIONS 

Equation <2U shows that the pair correlation depends on the three- 
and four-point correlation functions. The previous Section showed 
that such terms do appear if galaxies are associated with peaks 
in the density field. However, another way to produce non-zero 
higher-order correlations is through gravitational processes, and 
it is interesting to consider how such processes could affect pair 
correlations (and hence the merger bias). We therefore next con- 
sider objects that are distributed like the mass for a non-Gaussian 
mass distribution produced by gravitational amplification, to the 
quasilinear regime, of primordial Gaussian perturbations. At red- 
shift z — 0, the quasilinear regime occurs at ~ 10 Mpc; at redshift 
z = 3, it occurs at ~ 1 Mpc. The bispectrum and trispectrum for 
this case can be calculated from cosmological perturbation theory 
and from them the three- and four-point correlation functions. The 
expressions can be quite formidable iGoroff etalJll986h. but for- 
tunately for us, Bernardeau (1996; see also lBemarfeau et all2002t) 
has calculated the quantities required here. In particular, in the non- 
linear regime, 



(6l(xi)Sp(x 2 )) c = C 2 ,i (<5p) £(\xi - x 2 \), 
and 

(5p(xi)<5p(f 2 )) c = CL (<5p) 2 C(|^i - x 2 \), 
where 

c _ 68 1 dlog(3p) 
21 3 dlogr p 



(25) 
(26) 

(27) 



In the limit that > 1, £, we find 

X(r)~cL£(r). 



(28) 



We note that dlog(^<5p^) /dlogr p = dlog£/dlogr. For the 
scales probed by Lyman-break galaxies, the linear-theory correla- 
tion function is roughly £ oc r~ 2 , while stable clustering leads to 
a correlation function £(r) oc r -1 ' 8 . For these correlation-function 
scalings, X(r) ~ 7£(r); i.e., pairs are biased by roughly a factor 
of 2.6 relative to galaxies. If, on the other hand, £(r) oc constant 
at small radii (as expected for P(k) oc k n with n — —3), then 
X(r) ~ 10£(r). We thus find that in the quasilinear regime, pairs 
can be biased, perhaps strongly so, compared with the individual 
objects, even if they trace the mass. This could further enhance the 
clustering of mergers, if they are associated with pairs of objects. 
We emphasize that equation J28I is applicable on scales at which 
the underlying mass perturbations have £ ~ 1 and assumes that the 
objects of interest exactly trace the mass distribution. They are thus 
only directly applicable in the limited regime of relatively unbiased 
objects on moderately small scales, although the qualitative results 
likely apply to more biased objects as well (see the discussion at 
the end of Section 7). 



7 HALO CLUSTERING MODEL 

We will now briefly consider pair clustering in the highly nonlinear 
regime. In this case, perturbation theory is no longer appropriate, 
so we will turn to the halo model of the density field. The halo 
clustering model postulates a distribution of virialized dark-matter 
haloes, each with a radial (r) density profile ph(m; r) that depends 
on its mass m. On large scales, the clustering is that of biased peaks, 
possibly in the quasilinear regime, which we already considered 
above. On nonlinear scales, the clustering is described within indi- 
vidual haloes. Of course, in this "one-halo" regime, the distribution 
of objects is ultimately due to the interactions between them (such 
as dynamical friction acting on satellite galaxies). Our treatment is 
thus only approximate: it predicts the clustering of pairs given a 
density profile and implicitly ignores interactions. It could, never- 
theless, be useful inside clusters of galaxies in which a population 
of small "tracer" haloes orbit in a potential dominated by the mas- 
sive cluster. 

For the purposes of illustration, we suppose that all haloes 
have the same mass and power-law radial density profile: p oc r~ 7 
for r < R, and p(r) = for r > R. We will only consider corre- 
lations on small scales, within an individual halo (which should be 
appropriate on small scales in the highly nonlinear regime). The au- 
tocorrelation function for the mass is then IScherrer & BertschingeJ 
ll99ltlCoorav & Shetrl2o61 . 



r2 = 



(p( ri)p(r 2 )) 
<P> 2 



(29) 



where the angle brackets denote an average over all space. The 
mean density is (p) — n^ioM, where rihaio is the spatial num- 
ber density of halos and M is the halo mass, and 



<p(ri)p(r 2 )) = ?i h aio / d 3 xp(\?i - x\)p(\f 2 - x\). 



(30) 



The integral in equation J30t is particularly simple at zero lag, 
where the autocorrelation function for the mass is 



f(r = 0) = (Atvu^R 3 )- 1 ^—!^ - 1, 



(3D 
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for 7 < 3/2. For 7 > 3/2, the divergence at the r — *• limit of the 
integrand can be tempered by measuring correlations over a finite 
smoothing volume of radius r s (as would occur in any physical 
observation). Thus, for 7 > 3/2, we find 

«r = 0) = (J) 3 ' 27 - L ( 32 ) 

For r < R and 7 > 3/2 (and for rihaio^ 3 <C 1), the mass corre- 
lation function scales with radius r as f (r) oc r 3-27 ; for 7 < 3/2, 
it decreases less rapidly with radius. For 7 = 3/2, the power laws 
are replaced by logarithms. 

The pair correlation function follows simply by noting that 
pairs are distributed in the halo as p 2 . We can therefore simply re- 
place 7 — > 27 in the results for the mass correlation functions. 
Thus, for 7 < 3/4, the zero-lag pair correlation function is 

X(r = 0) = (47rn halo fl 3 )- 1 ~ 1, (33) 

and for 7 > 3/4, 

X(r = 0) = (^n^RY 1 ^ 2 ^ (j)^ " L (34) 

The pair correlation function scales, for 7 > 3/4, with radius r 
as oc r 3-47 , and it decreases less rapidly with r for 7 < 

3/4. For 3/4 < 7 < 3/2, the pair correlation diverges (modulo 
the smoothing) at small radii, while the mass correlation function 
approaches a constant as r — > 0. 

We thus see that the distribution of pairs and mass differ, and 
thus that there should be a (scale-dependent) bias between them. 
Our calculation is applicable in the nonlinear regime, when the cor- 
relation function is measured at distances r <C R. The pair bias 
can then be approximated by the square root of the ratio of zero-lag 
biases. For example, if 7 = 1/2, then the pair bias evaluates to 
bp = 2 5//2 /5 ~ 1.1. For 7^0, the pair bias approaches 1, which 
is what we expect for objects distributed uniformly in a halo. The 
zero-lag bias may be considerably larger for 3/4 < 7 < 3/2, when 
the pair correlation function diverges as r — > 0, while the mass cor- 
relation does not. 

So far, we have considered pair correlations for a highly bi- 
ased population in the linear regime as well as for a population 
that traces the mass in the quasilinear regime and in the nonlinear 
regime. What about pair correlations for a highly biased popula- 
tion in the quasilinear or nonlinear regimes? It has been argued that 
in the quasilinear regime, highly-biased tra cers are more likely to 
be found in denser regions lCole& Raise J 19891 ISheth & To rmen 
1999); calculation of the pair correlation for a population biased 
in Lagrangian space evolved into the quasilinear regime could be 
done following the techniques of Frv 1 1996), Catelan et al. 1 1998), 
and Catela n. Porciani. & Kamionkowskil l2000t) . but we leave that 
for future work. And what about the nonlinear regime? Numer- 
ical simulations have suggested that the distribution of primor- 
dial density peaks in larger virialized haloes (i.e., the nonlinear 
regime) is more highly peaked toward the centers than the mass as 
a whole lSantoj2003l:lMoore et all 1991 1 White & Springel200cl 
iDiema nd. Madau, & Moore 120051) . If so! and if, as we have seen, 
the bias of pair correlations is enhanced with steeper density pro- 
files, then the bias of pair correlations for rare objects in the quasi- 
linear and nonlinear regimes may be even further enhanced. 



8 DISCUSSION 

In this paper, we have investigated the implications of the ex- 
tended Press-Schechter and Mo & White 1 1996) biasing scheme for 
merger bias and pointed out some shortcomings and ambiguities in 
this approach. In particular, we showed that this approach yields no 
merger bias, but only because it explicitly ignores the variation of 
merger rates with the large-scale density field. We then showed that 
a simple model in which the merger rate scales only with the halo 
abundances predicts that mergers of massive galaxies will be more 
biased than the halo population but that mergers of small galax- 
ies will be less biased. Furthermore, the merger bias will evolve 
significantly with redshift. These may provide useful clues to rec- 
onciling the various simulations jSc annapieco & Thacker l2003t 
IPercival et alj2003tlGao. Springel. & W hite 200$. However, these 
techniques are clearly inadequate for understanding merger bias on 
any quantitative level (at least until a self-consistent merger kernel 
is available). 

We therefore moved on to hypothesize that close pairs in a 
clustering model are likely to yield mergers. We thus studied the 
clustering of close pairs in a variety of models in which objects 
Poisson sample (1) the mass in a Gaussian random field; (2) the 
high-density peaks in a Gaussian random field; (3) the mass in 
the quasilinear regime; and (4) the mass in virialized haloes with 
power-law density profiles. We find that in many (though not all) 
cases, close pairs can be more highly clustered than individual ob- 
jects. If so, and if close pairs are likely to lead to mergers, then the 
clustering of objects that have undergone recent mergers can be en- 
hanced relative to the clustering of individual haloes of comparable 
masses. We have thus shown that, in the simplest picture of merg- 
ers, an extra bias (of some magnitude) is generic to most cluster- 
ing models. The actual magnitude of the bias (or the lack of it, as 
in the simulations of lPercival et aljEo03l ; iGao. Springel. & White! 
2005) is therefore revealing something fundamental about the halo- 
merging process — an area in need of substantial theoretical insight 
I Bens on et all2005l) . 

Even if we do identify close pairs with mergers, there are still 
a multitude of theoretical steps — each fraught with considerable 
uncertainties — that must be taken to connect close pairs of galactic 
haloes with, e.g., the observational constraints on LBGs. We have 
considered the behavior under a variety of limits, but the more gen- 
eral case must be treated numerically. Still, it is interesting to inves- 
tigate whether pair biasing might be in the right ballpark to account 
for the discrepancy between the LBG dynamical and clustering 
masses. According to Adelberger et al. 1 1998), the bias of LBGs is 
&lbg ~ 4.0, roughly consistent with that expected for ~ 10 12 Mq 
objects (see also Adelberger et al. 2005, who estimate a similar me- 
dian mass for a larger sample of objects at z = 3). Although the 
abundance of haloes with such masses is consistent with the abun- 
dance of LBGs, it requires that every such halo h ouse a galaxy tha t 
produces stars at a prodigious rate I Adelberger & Steidell 2000). 
On the other hand, the linewidths and kinematic s of LBGs sug- 
gest m asses closer to ~ 10 11 M© IPettini et alJl200ll lErb et all 
2003). Haloes of these masses have a much higher abundance, al- 
lowing consistency with the LBG abundance if the efficiency for 
~ 10 11 M© haloes to produce extremely luminous objects is rela- 
tively low, ~ 10% — understandable, perhaps, if only recent merg- 
ers of ~ 10 11 Mq haloes produce LBGs. (An alternate possibility 
is that dynamical mass measurements are only sensitive to a small 
fraction of the halo and that LBGs are ubiquitous in large dark mat- 
ter haloes; ICooravl2005h 

The only remaining problem with the small-mass LBG sce- 
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nario is why the bias 6lbg ~ 4 is so much larger than the 
bias 6n ~ 2.4 expect ed for a sample with ~ 10 11 M© haloes. 
Adelb erger et alJ 1 19981) measure the clustering through a counts- 
in-cells analysis within boxes of size 11.4 h± 00 Mpc. This is within 
the linear regime at redshifts 2 ~ 3, and with an expected bias 
fen w 2.4, the variance in the ~ 10 11 Mq halo distribution is 
(Tg a i ~ 0.8. It is also reasonable to assume a pair spacing with 
<5p) 3> 1. Although the pair bias implied by equation 1241 is not 
linear, most of the weight for the counts-in-cells analysis occurs 
at the largest radii. We thus estimate from equation I24i a pair bias 
(i.e., the extra biasing of mergers relative to the objects themselves) 
of bp = v/ X(r)/£, g (r) ~ 3.4. This is more than enough to make 
the net merger bias (6 m = b p bh) comparable to &lbg- However, 
note that v ~ 1.6 for 10 11 Mq haloes, so the true amplification 
should be smaller than the v 2> 1 limit we have taken. This may be 
further augmented by quasilinear effects, which could contribute a 
comparable pair bias over some fraction of the cell. 

A similar, though perhaps even more desperate, problem oc- 
curs for submillimetre-selected galaxies. iBlain et all 120041) claim 
that the clustering of these galaxies indicates halo masses of ~ 
10 13 Mq while kinematic measurements yield values an order of 
magnitude smaller, even allowing for the mass in the outer regions 
of the halo. Our results may help resolve this discrepancy as well, 
if submillimetre galaxies are the products of recent mergers. More- 
over, IBlain etalJ l2004) measured clustering through the rate of in- 
cidence of close pairs in their survey fields. They assumed a corre- 
lation function of fixed shape £ 9 (r) oc r~ ' 8 and varied its ampli- 
tude until they recovered the observed number of pairs; the inferred 
correlation length could then be matched to a halo mass. We have 
shown that the clustering of pairs is not the same as the clustering 
of the underlying objects and depends on the underlying halo pop- 
ulation, the scales of interest, and even the relation of haloes to the 
underlying density field. The effective pair bias can be significantly 
larger than the bias of the haloes themselves, so pair-counting tech- 
niques must be approached with care. The precise effects are diffi- 
cult to predict given the "pencil-beam" geometries of their surveys, 
but they certainly merit further study. 

Before closing, we note that our results may be applicable 
elsewhere as well. For example, galax y clusters are highly biased 
tracers of the mass distribution today (Bah calletail E003). Their 
correlation length may be as large as ~ 25 ft^jj, Mpc, as op- 
posed to a correlation length ~ 5 — 7 h^ Mpc for the mass. 
If this bias occurs because clusters form at peaks of the pri- 
mordial density distribution, then they should experience higher- 
order clustering as described in Section 5. Moreover, at distances 
> 10 h^Q Mpc, quasilinear effects should be small. There will 
thus be testable predictions for the clustering of close pairs of clus- 
ters, or — if pairs are associated with mergers — for the clustering of 
recently merged clusters. As another example, non- trivial merger 
bias would modify the interpretation of AGN clustering (provided 
that they are fueled by merger activity). This would be particu- 
larly important for understanding the ir host properties and their 
lifetim es lLa Franca, Andreani, & Cristiarul |l998l; lHaiman & Huil 
l200lt iMartini & Weinberg! 1200 it lAdelberger & Steidej|2005 Jbl) . 
We leave further discussion of these possibilities to future work. 

We thank the referee, L. Miller, for helpful comments. This 
work was supported in part by DoE DE-FG03-92-ER40701 and 
NASA NNG05GF69G. 
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